Air Quality Measure based on AQI¶

  • An air quality index (AQI) is used by government agencies to communicate to the public how polluted the air currently is or how polluted it is forecast to become. Public health risks increase as the AQI rises.

  • There are six AQI categories, namely Good, Satisfactory, Moderately polluted, Poor, Very Poor, and Severe. The proposed AQI will consider eight pollutants (PM10, PM2.5, NO2, SO2, CO, O3, NH3, and Pb) for which short-term (up to 24-hourly averaging period) National Ambient Air Quality Standards are prescribed.

  • Based on the measured ambient concentrations, corresponding standards and likely health impact, a sub-index is calculated for each of these pollutants. The worst sub-index reflects overall AQI. Likely health impacts for different AQI categories and pollutants have also been suggested, with primary inputs from the medical experts in the group.

  • The AQI values and corresponding ambient concentrations (health breakpoints) as well as associated likely health impacts for the identified eight pollutants are as follows:

aqi.png

  • Associated Health Impacts

aqi_effects.png

About The Project¶

  • In this project we mainly focused on cleaning of data and tried to interpret various conclusions and visualisations from the collected data by using various libraries.

  • We tried to visualise the yearly data of every pollutant, tried to find the most polluted and the least polluted city based on the data - station wise as well as city wise.

  • At last we worked on a hypothesis testing, which works around the quality of air before and after COVID-19.

Libraries¶

Data Collection¶

Data collected from: https://www.kaggle.com/rohanrao/air-quality-data-in-india

Data Preprocessing and Cleaning¶

Cleaning of Stations' Data to remove the missing values¶

StationId StationName City State Status
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh Active
1 AP002 Anand Kala Kshetram, Rajamahendravaram - APPCB Rajamahendravaram Andhra Pradesh NaN
2 AP003 Tirumala, Tirupati - APPCB Tirupati Andhra Pradesh NaN
3 AP004 PWD Grounds, Vijayawada - APPCB Vijayawada Andhra Pradesh NaN
4 AP005 GVM Corporation, Visakhapatnam - APPCB Visakhapatnam Andhra Pradesh Active
... ... ... ... ... ...
225 WB010 Jadavpur, Kolkata - WBPCB Kolkata West Bengal Active
226 WB011 Rabindra Bharati University, Kolkata - WBPCB Kolkata West Bengal Active
227 WB012 Rabindra Sarobar, Kolkata - WBPCB Kolkata West Bengal Active
228 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal Active
229 WB014 Ward-32 Bapupara, Siliguri - WBPCB Siliguri West Bengal NaN

230 rows × 5 columns

StationId StationName City State Status
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh Active
4 AP005 GVM Corporation, Visakhapatnam - APPCB Visakhapatnam Andhra Pradesh Active
5 AS001 Railway Colony, Guwahati - APCB Guwahati Assam Active
10 BR005 DRM Office Danapur, Patna - BSPCB Patna Bihar Active
11 BR006 Govt. High School Shikarpur, Patna - BSPCB Patna Bihar Active
... ... ... ... ... ...
224 WB009 Fort William, Kolkata - WBPCB Kolkata West Bengal Active
225 WB010 Jadavpur, Kolkata - WBPCB Kolkata West Bengal Active
226 WB011 Rabindra Bharati University, Kolkata - WBPCB Kolkata West Bengal Active
227 WB012 Rabindra Sarobar, Kolkata - WBPCB Kolkata West Bengal Active
228 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal Active

133 rows × 5 columns

StationId StationName City State
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh
1 AP005 GVM Corporation, Visakhapatnam - APPCB Visakhapatnam Andhra Pradesh
2 AS001 Railway Colony, Guwahati - APCB Guwahati Assam
3 BR005 DRM Office Danapur, Patna - BSPCB Patna Bihar
4 BR006 Govt. High School Shikarpur, Patna - BSPCB Patna Bihar
... ... ... ... ...
128 WB009 Fort William, Kolkata - WBPCB Kolkata West Bengal
129 WB010 Jadavpur, Kolkata - WBPCB Kolkata West Bengal
130 WB011 Rabindra Bharati University, Kolkata - WBPCB Kolkata West Bengal
131 WB012 Rabindra Sarobar, Kolkata - WBPCB Kolkata West Bengal
132 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal

133 rows × 4 columns

Merging two dataframes i.e. station_day and stations on basis of StationId.¶

StationId Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI AQI_Bucket
0 AP001 2017-11-24 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 0.17 5.92 0.10 NaN NaN
1 AP001 2017-11-25 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06 184.0 Moderate
2 AP001 2017-11-26 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08 197.0 Moderate
3 AP001 2017-11-27 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.29 7.63 0.12 198.0 Moderate
4 AP001 2017-11-28 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 0.17 5.02 0.07 188.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
108030 WB013 2020-06-27 8.65 16.46 NaN NaN NaN NaN 0.69 4.36 30.59 1.32 7.26 NaN 50.0 Good
108031 WB013 2020-06-28 11.80 18.47 NaN NaN NaN NaN 0.68 3.49 38.95 1.42 7.92 NaN 65.0 Satisfactory
108032 WB013 2020-06-29 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 3.52 8.64 NaN 63.0 Satisfactory
108033 WB013 2020-06-30 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 1.86 8.40 NaN 57.0 Satisfactory
108034 WB013 2020-07-01 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 1.31 7.39 NaN 59.0 Satisfactory

108035 rows × 16 columns

StationId StationName City State Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI AQI_Bucket
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-24 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 0.17 5.92 0.10 NaN NaN
1 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-25 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06 184.0 Moderate
2 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-26 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08 197.0 Moderate
3 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-27 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.29 7.63 0.12 198.0 Moderate
4 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-28 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 0.17 5.02 0.07 188.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
107706 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-27 8.65 16.46 NaN NaN NaN NaN 0.69 4.36 30.59 1.32 7.26 NaN 50.0 Good
107707 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-28 11.80 18.47 NaN NaN NaN NaN 0.68 3.49 38.95 1.42 7.92 NaN 65.0 Satisfactory
107708 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-29 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 3.52 8.64 NaN 63.0 Satisfactory
107709 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-30 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 1.86 8.40 NaN 57.0 Satisfactory
107710 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-07-01 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 1.31 7.39 NaN 59.0 Satisfactory

107711 rows × 19 columns

StationId StationName City State Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-24 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 0.17 5.92 0.10 184.0 Moderate
1 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-25 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06 184.0 Moderate
2 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-26 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08 197.0 Moderate
3 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-27 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.29 7.63 0.12 198.0 Moderate
4 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-28 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 0.17 5.02 0.07 188.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
107706 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-27 8.65 16.46 NaN NaN NaN NaN 0.69 4.36 30.59 1.32 7.26 NaN 50.0 Good
107707 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-28 11.80 18.47 13.65 200.87 214.20 11.40 0.68 3.49 38.95 1.42 7.92 NaN 65.0 Satisfactory
107708 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-29 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 3.52 8.64 NaN 63.0 Satisfactory
107709 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-30 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 1.86 8.40 NaN 57.0 Satisfactory
107710 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-07-01 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 1.31 7.39 NaN 59.0 Satisfactory

107711 rows × 19 columns

Cleaning of Cities' Data to remove the missing values¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI AQI_Bucket
0 Ahmedabad 2015-01-01 NaN NaN 0.92 18.22 17.15 NaN 0.92 27.64 133.36 0.00 0.02 0.00 NaN NaN
1 Ahmedabad 2015-01-02 NaN NaN 0.97 15.69 16.46 NaN 0.97 24.55 34.06 3.68 5.50 3.77 NaN NaN
2 Ahmedabad 2015-01-03 NaN NaN 17.40 19.30 29.70 NaN 17.40 29.07 30.70 6.80 16.40 2.25 NaN NaN
3 Ahmedabad 2015-01-04 NaN NaN 1.70 18.48 17.97 NaN 1.70 18.59 36.08 4.43 10.14 1.00 NaN NaN
4 Ahmedabad 2015-01-05 NaN NaN 22.10 21.42 37.76 NaN 22.10 39.33 39.31 7.01 18.89 2.78 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
29526 Visakhapatnam 2020-06-27 15.02 50.94 7.68 25.06 19.54 12.47 0.47 8.55 23.30 2.24 12.07 0.73 41.0 Good
29527 Visakhapatnam 2020-06-28 24.38 74.09 3.42 26.06 16.53 11.99 0.52 12.72 30.14 0.74 2.21 0.38 70.0 Satisfactory
29528 Visakhapatnam 2020-06-29 22.91 65.73 3.45 29.53 18.33 10.71 0.48 8.42 30.96 0.01 0.01 0.00 68.0 Satisfactory
29529 Visakhapatnam 2020-06-30 16.64 49.97 4.05 29.26 18.80 10.03 0.52 9.84 28.30 0.00 0.00 0.00 54.0 Satisfactory
29530 Visakhapatnam 2020-07-01 15.00 66.00 0.40 26.85 14.05 5.20 0.59 2.10 17.05 NaN NaN NaN 50.0 Good

29531 rows × 16 columns

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
0 Ahmedabad 2015-01-01 NaN NaN 0.92 18.22 17.15 NaN 0.92 27.64 133.36 0.00 0.02 0.00 NaN NaN
1 Ahmedabad 2015-01-02 NaN NaN 0.97 15.69 16.46 NaN 0.97 24.55 34.06 3.68 5.50 3.77 NaN NaN
2 Ahmedabad 2015-01-03 NaN NaN 17.40 19.30 29.70 NaN 17.40 29.07 30.70 6.80 16.40 2.25 NaN NaN
3 Ahmedabad 2015-01-04 NaN NaN 1.70 18.48 17.97 NaN 1.70 18.59 36.08 4.43 10.14 1.00 NaN NaN
4 Ahmedabad 2015-01-05 NaN NaN 22.10 21.42 37.76 NaN 22.10 39.33 39.31 7.01 18.89 2.78 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
29526 Visakhapatnam 2020-06-27 15.02 50.94 7.68 25.06 19.54 12.47 0.47 8.55 23.30 2.24 12.07 0.73 41.0 Good
29527 Visakhapatnam 2020-06-28 24.38 74.09 3.42 26.06 16.53 11.99 0.52 12.72 30.14 0.74 2.21 0.38 70.0 Satisfactory
29528 Visakhapatnam 2020-06-29 22.91 65.73 3.45 29.53 18.33 10.71 0.48 8.42 30.96 0.01 0.01 0.00 68.0 Satisfactory
29529 Visakhapatnam 2020-06-30 16.64 49.97 4.05 29.26 18.80 10.03 0.52 9.84 28.30 0.00 0.00 0.00 54.0 Satisfactory
29530 Visakhapatnam 2020-07-01 15.00 66.00 0.40 26.85 14.05 5.20 0.59 2.10 17.05 NaN NaN NaN 50.0 Good

29531 rows × 16 columns

Data Visualisation¶

Descriptive Analysis of Stations_day¶

(107711, 19)

Calculating the number of null values in each columns.¶

StationId          0
StationName        0
City               0
State              0
Date               0
PM2.5          20417
PM10           41789
NO             15629
NO2            15058
NOx            14346
NH3            47245
CO             11386
SO2            23922
O3             24213
Benzene        30164
Toluene        37453
Xylene         84595
AQI            18958
Air_quality    18958
dtype: int64

Using the missingo library for getting the visual interpretation of missing values, so that we can replace it with some other values.

<AxesSubplot:>
Your selected dataframe has 19 columns.
There are 14 columns that have missing values.
  Missing Values % of Total Values
Xylene 84595 78.500000
NH3 47245 43.900000
PM10 41789 38.800000
Toluene 37453 34.800000
Benzene 30164 28.000000
O3 24213 22.500000
SO2 23922 22.200000
PM2.5 20417 19.000000
AQI 18958 17.600000
Air_quality 18958 17.600000
NO 15629 14.500000
NO2 15058 14.000000
NOx 14346 13.300000
CO 11386 10.600000
<class 'pandas.core.frame.DataFrame'>
Int64Index: 107711 entries, 0 to 107710
Data columns (total 19 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   StationId    107711 non-null  object        
 1   StationName  107711 non-null  object        
 2   City         107711 non-null  object        
 3   State        107711 non-null  object        
 4   Date         107711 non-null  datetime64[ns]
 5   PM2.5        87294 non-null   float64       
 6   PM10         65922 non-null   float64       
 7   NO           92082 non-null   float64       
 8   NO2          92653 non-null   float64       
 9   NOx          93365 non-null   float64       
 10  NH3          60466 non-null   float64       
 11  CO           96325 non-null   float64       
 12  SO2          83789 non-null   float64       
 13  O3           83498 non-null   float64       
 14  Benzene      77547 non-null   float64       
 15  Toluene      70258 non-null   float64       
 16  Xylene       23116 non-null   float64       
 17  AQI          88753 non-null   float64       
 18  Air_quality  88753 non-null   object        
dtypes: datetime64[ns](1), float64(13), object(5)
memory usage: 16.4+ MB
column name:StationId   unique values:108
column name:StationName   unique values:108
column name:City   unique values:24
column name:State   unique values:21
column name:Date   unique values:2009
column name:PM2.5   unique values:22392
column name:PM10   unique values:29547
column name:NO   unique values:11914
column name:NO2   unique values:12051
column name:NOx   unique values:15585
column name:NH3   unique values:9112
column name:CO   unique values:2353
column name:SO2   unique values:5802
column name:O3   unique values:11161
column name:Benzene   unique values:3018
column name:Toluene   unique values:8714
column name:Xylene   unique values:1893
column name:AQI   unique values:931
column name:Air_quality   unique values:7

Visualising the yearly data of every pollutant

We're making a column which only comprises of Benzene + Toluene + Xylene because of its same biological nature.

We're making a Patriculate_Matter only column.

StationId StationName City State Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 AQI Air_quality BTX Particulate_Matter
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-24 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 184.0 Moderate 6.19 187.11
1 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-25 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 184.0 Moderate 6.76 205.90
2 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-26 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 197.0 Moderate 8.25 207.38
3 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-27 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 198.0 Moderate 8.04 224.08
4 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-28 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 188.0 Moderate 5.26 168.27
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
107706 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-27 8.65 16.46 NaN NaN NaN NaN 0.69 4.36 30.59 50.0 Good NaN 25.11
107707 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-28 11.80 18.47 13.65 200.87 214.20 11.40 0.68 3.49 38.95 65.0 Satisfactory NaN 30.27
107708 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-29 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 63.0 Satisfactory NaN 50.86
107709 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-30 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 57.0 Satisfactory NaN 55.37
107710 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-07-01 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 59.0 Satisfactory NaN 47.00

107711 rows × 18 columns

Descriptive Analysis of city_day¶

(29531, 16)

Calculating the number of null values in each columns.¶

City               0
Date               0
PM2.5           4321
PM10           10866
NO              3276
NO2             3278
NOx             3980
NH3            10061
CO              1745
SO2             3510
O3              3664
Benzene         5298
Toluene         7739
Xylene         17878
AQI             4174
Air_quality     4174
dtype: int64

Using the missingo library for getting the viusal interpretation of missing values, so that we can replace it with some other values.

<AxesSubplot:>
Your selected dataframe has 16 columns.
There are 14 columns that have missing values.
  Missing Values % of Total Values
Xylene 17878 60.500000
PM10 10866 36.800000
NH3 10061 34.100000
Toluene 7739 26.200000
Benzene 5298 17.900000
PM2.5 4321 14.600000
AQI 4174 14.100000
Air_quality 4174 14.100000
NOx 3980 13.500000
O3 3664 12.400000
SO2 3510 11.900000
NO2 3278 11.100000
NO 3276 11.100000
CO 1745 5.900000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29531 entries, 0 to 29530
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   City         29531 non-null  object        
 1   Date         29531 non-null  datetime64[ns]
 2   PM2.5        25210 non-null  float64       
 3   PM10         18665 non-null  float64       
 4   NO           26255 non-null  float64       
 5   NO2          26253 non-null  float64       
 6   NOx          25551 non-null  float64       
 7   NH3          19470 non-null  float64       
 8   CO           27786 non-null  float64       
 9   SO2          26021 non-null  float64       
 10  O3           25867 non-null  float64       
 11  Benzene      24233 non-null  float64       
 12  Toluene      21792 non-null  float64       
 13  Xylene       11653 non-null  float64       
 14  AQI          25357 non-null  float64       
 15  Air_quality  25357 non-null  object        
dtypes: datetime64[ns](1), float64(13), object(2)
memory usage: 3.6+ MB
column name:City   unique values:26
column name:Date   unique values:2009
column name:PM2.5   unique values:11717
column name:PM10   unique values:12572
column name:NO   unique values:5777
column name:NO2   unique values:7405
column name:NOx   unique values:8157
column name:NH3   unique values:5923
column name:CO   unique values:1780
column name:SO2   unique values:4762
column name:O3   unique values:7700
column name:Benzene   unique values:1874
column name:Toluene   unique values:3609
column name:Xylene   unique values:1562
column name:AQI   unique values:830
column name:Air_quality   unique values:7

Visualising the yearly data of every pollutant

We're making a column which only comprises of Benzene + Toluene + Xylene because of its same biological nature.

We're making a Patriculate_Matter only column.

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 AQI Air_quality BTX Particulate_Matter
0 Ahmedabad 2015-01-01 NaN NaN 0.92 18.22 17.15 NaN 0.92 27.64 133.36 NaN NaN 0.02 NaN
1 Ahmedabad 2015-01-02 NaN NaN 0.97 15.69 16.46 NaN 0.97 24.55 34.06 NaN NaN 12.95 NaN
2 Ahmedabad 2015-01-03 NaN NaN 17.40 19.30 29.70 NaN 17.40 29.07 30.70 NaN NaN 25.45 NaN
3 Ahmedabad 2015-01-04 NaN NaN 1.70 18.48 17.97 NaN 1.70 18.59 36.08 NaN NaN 15.57 NaN
4 Ahmedabad 2015-01-05 NaN NaN 22.10 21.42 37.76 NaN 22.10 39.33 39.31 NaN NaN 28.68 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
29526 Visakhapatnam 2020-06-27 15.02 50.94 7.68 25.06 19.54 12.47 0.47 8.55 23.30 41.0 Good 15.04 65.96
29527 Visakhapatnam 2020-06-28 24.38 74.09 3.42 26.06 16.53 11.99 0.52 12.72 30.14 70.0 Satisfactory 3.33 98.47
29528 Visakhapatnam 2020-06-29 22.91 65.73 3.45 29.53 18.33 10.71 0.48 8.42 30.96 68.0 Satisfactory 0.02 88.64
29529 Visakhapatnam 2020-06-30 16.64 49.97 4.05 29.26 18.80 10.03 0.52 9.84 28.30 54.0 Satisfactory 0.00 66.61
29530 Visakhapatnam 2020-07-01 15.00 66.00 0.40 26.85 14.05 5.20 0.59 2.10 17.05 50.0 Good NaN 81.00

29531 rows × 15 columns

Data Statistics¶

Most and Least Polluted Stations based on the given dataset¶

Most Polluted¶

StationId StationName City State PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 AQI Air_quality BTX Particulate_Matter
Date
2017-11-24 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 184.0 Moderate 6.19 187.11
2017-11-25 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 184.0 Moderate 6.76 205.90
2017-11-26 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 197.0 Moderate 8.25 207.38
2017-11-27 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 198.0 Moderate 8.04 224.08
2017-11-28 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 188.0 Moderate 5.26 168.27
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-06-27 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 8.65 16.46 NaN NaN NaN NaN 0.69 4.36 30.59 50.0 Good NaN 25.11
2020-06-28 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 11.80 18.47 13.65 200.87 214.20 11.40 0.68 3.49 38.95 65.0 Satisfactory NaN 30.27
2020-06-29 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 63.0 Satisfactory NaN 50.86
2020-06-30 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 57.0 Satisfactory NaN 55.37
2020-07-01 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 59.0 Satisfactory NaN 47.00

107711 rows × 17 columns

  StationName PM2.5
0 Anand Vihar, Delhi - DPCC 152.350000
1 Talkatora District Industries Center, Lucknow - CPCB 134.690000
2 DTU, Delhi - CPCB 131.080000
3 IGSC Planetarium Complex, Patna - BSPCB 130.450000
4 Jahangirpuri, Delhi - DPCC 128.120000
5 Wazirpur, Delhi - DPCC 127.510000
6 Mundka, Delhi - DPCC 122.460000
7 Rohini, Delhi - DPCC 122.420000
8 Bawana, Delhi - DPCC 120.950000
9 Burari Crossing, Delhi - IMD 120.820000
  StationName PM10
0 Anand Vihar, Delhi - DPCC 358.120000
1 Wazirpur, Delhi - DPCC 277.420000
2 Dwarka-Sector 8, Delhi - DPCC 276.470000
3 Mundka, Delhi - DPCC 269.130000
4 Jahangirpuri, Delhi - DPCC 259.450000
5 Sirifort, Delhi - CPCB 252.480000
6 Rohini, Delhi - DPCC 247.870000
7 NSIT Dwarka, Delhi - CPCB 242.770000
8 R K Puram, Delhi - DPCC 242.410000
9 DTU, Delhi - CPCB 236.060000
  StationName Particulate_Matter
0 Anand Vihar, Delhi - DPCC 509.740000
1 Wazirpur, Delhi - DPCC 405.510000
2 Mundka, Delhi - DPCC 391.590000
3 Jahangirpuri, Delhi - DPCC 387.750000
4 Dwarka-Sector 8, Delhi - DPCC 379.790000
5 Rohini, Delhi - DPCC 370.440000
6 R K Puram, Delhi - DPCC 361.170000
7 Bawana, Delhi - DPCC 354.700000
8 Sirifort, Delhi - CPCB 349.750000
9 DTU, Delhi - CPCB 346.270000
  StationName NO
0 Samanpura, Patna - BSPCB 124.000000
1 Anand Vihar, Delhi - DPCC 90.860000
2 Pusa, Delhi - DPCC 73.060000
3 DRM Office Danapur, Patna - BSPCB 64.930000
4 Major Dhyan Chand National Stadium, Delhi - DPCC 57.780000
5 R K Puram, Delhi - DPCC 54.420000
6 Chhatrapati Shivaji Intl. Airport (T2), Mumbai - MPCB 53.380000
7 Jawaharlal Nehru Stadium, Delhi - DPCC 52.570000
8 ITO, Delhi - CPCB 50.600000
9 Sirifort, Delhi - CPCB 46.990000
  StationName NO2
0 Anand Vihar, Delhi - DPCC 88.720000
1 Punjabi Bagh, Delhi - DPCC 73.280000
2 Rajbansi Nagar, Patna - BSPCB 65.900000
3 Jahangirpuri, Delhi - DPCC 65.860000
4 Jawaharlal Nehru Stadium, Delhi - DPCC 63.340000
5 R K Puram, Delhi - DPCC 63.010000
6 Pusa, Delhi - DPCC 59.570000
7 Sirifort, Delhi - CPCB 58.860000
8 Maninagar, Ahmedabad - GPCB 58.850000
9 Major Dhyan Chand National Stadium, Delhi - DPCC 58.720000
  StationName NOx
0 Anand Vihar, Delhi - DPCC 148.780000
1 Samanpura, Patna - BSPCB 141.460000
2 East Arjun Nagar, Delhi - CPCB 121.800000
3 Pusa, Delhi - DPCC 91.230000
4 R K Puram, Delhi - DPCC 86.360000
5 Chhatrapati Shivaji Intl. Airport (T2), Mumbai - MPCB 84.670000
6 Major Dhyan Chand National Stadium, Delhi - DPCC 82.100000
7 Jawaharlal Nehru Stadium, Delhi - DPCC 81.310000
8 Sion, Mumbai - MPCB 74.310000
9 Victoria, Kolkata - WBPCB 73.730000
  StationName NH3
0 Manali, Chennai - CPCB 65.360000
1 Anand Vihar, Delhi - DPCC 55.780000
2 Jahangirpuri, Delhi - DPCC 55.670000
3 Rohini, Delhi - DPCC 53.010000
4 ITO, Delhi - CPCB 52.060000
5 IGSC Planetarium Complex, Patna - BSPCB 51.240000
6 NSIT Dwarka, Delhi - CPCB 48.870000
7 Patparganj, Delhi - DPCC 48.850000
8 Shadipur, Delhi - CPCB 45.680000
9 Mundka, Delhi - DPCC 45.510000
  StationName CO
0 Maninagar, Ahmedabad - GPCB 22.360000
1 BWSSB Kadabesanahalli, Bengaluru - CPCB 3.580000
2 Shadipur, Delhi - CPCB 3.480000
3 Peenya, Bengaluru - CPCB 3.010000
4 NSIT Dwarka, Delhi - CPCB 2.840000
5 Central School, Lucknow - CPCB 2.310000
6 Lalbagh, Lucknow - CPCB 2.260000
7 Anand Vihar, Delhi - DPCC 2.200000
8 ITO, Delhi - CPCB 2.120000
9 Alandur Bus Depot, Chennai - CPCB 1.950000
  StationName SO2
0 Maninagar, Ahmedabad - GPCB 55.250000
1 Tata Stadium, Jorapokhar - JSPCB 34.640000
2 Talcher Coalfields,Talcher - OSPCB 28.410000
3 Pusa, Delhi - IMD 27.630000
4 Lodhi Road, Delhi - IMD 23.250000
5 North Campus, DU, Delhi - IMD 23.250000
6 IGSC Planetarium Complex, Patna - BSPCB 22.980000
7 R K Puram, Delhi - DPCC 20.690000
8 Punjabi Bagh, Delhi - DPCC 20.080000
9 Alipur, Delhi - DPCC 19.860000
  StationName O3
0 Punjabi Bagh, Delhi - DPCC 230.100000
1 Sector-51, Gurugram - HSPCB 80.750000
2 Manali Village, Chennai - TNPCB 69.610000
3 T T Nagar, Bhopal - MPPCB 59.940000
4 R K Puram, Delhi - DPCC 54.780000
5 Shastri Nagar, Jaipur - RSPCB 53.200000
6 Teri Gram, Gurugram - HSPCB 52.960000
7 Adarsh Nagar, Jaipur - RSPCB 51.530000
8 Sirifort, Delhi - CPCB 49.010000
9 Hombegowda Nagar, Bengaluru - KSPCB 47.610000
  StationName BTX
0 Jadavpur, Kolkata - WBPCB 220.430000
1 Talkatora District Industries Center, Lucknow - CPCB 56.030000
2 Maninagar, Ahmedabad - GPCB 37.630000
3 Burari Crossing, Delhi - IMD 33.190000
4 IDA Pashamylaram, Hyderabad - TSPCB 31.310000
5 Fort William, Kolkata - WBPCB 29.760000
6 Bidhannagar, Kolkata - WBPCB 27.680000
7 Ballygunge, Kolkata - WBPCB 26.130000
8 Mandir Marg, Delhi - DPCC 20.830000
9 Teri Gram, Gurugram - HSPCB 20.500000

Least Polluted¶

  StationName PM2.5
0 City Railway Station, Bengaluru - KSPCB 9.000000
1 East Arjun Nagar, Delhi - CPCB 11.110000
2 Sikulpuikawn, Aizawl - Mizoram PCB 16.850000
3 Manali Village, Chennai - TNPCB 24.480000
4 Plammoodu, Thiruvananthapuram - Kerala PCB 27.220000
5 Hombegowda Nagar, Bengaluru - KSPCB 27.520000
6 Hebbal, Bengaluru - KSPCB 28.930000
7 Kariavattom, Thiruvananthapuram - Kerala PCB 28.980000
8 SIDCO Kurichi, Coimbatore - TNPCB 29.090000
9 Borivali East, Mumbai - MPCB 29.290000
  StationName PM10
0 East Arjun Nagar, Delhi - CPCB 6.320000
1 Sikulpuikawn, Aizawl - Mizoram PCB 23.340000
2 Talkatora District Industries Center, Lucknow - CPCB 26.860000
3 SIDCO Kurichi, Coimbatore - TNPCB 37.740000
4 BWSSB Kadabesanahalli, Bengaluru - CPCB 40.750000
5 Lumpyngngad, Shillong - Meghalaya PCB 41.640000
6 Velachery Res. Area, Chennai - CPCB 43.490000
7 Alandur Bus Depot, Chennai - CPCB 48.550000
8 Kariavattom, Thiruvananthapuram - Kerala PCB 51.490000
9 Plammoodu, Thiruvananthapuram - Kerala PCB 52.190000
  StationName Particulate_Matter
0 East Arjun Nagar, Delhi - CPCB 17.430000
1 City Railway Station, Bengaluru - KSPCB 40.000000
2 Sikulpuikawn, Aizawl - Mizoram PCB 40.190000
3 Sanegurava Halli, Bengaluru - KSPCB 45.960000
4 Alandur Bus Depot, Chennai - CPCB 56.320000
5 Talkatora District Industries Center, Lucknow - CPCB 57.880000
6 Velachery Res. Area, Chennai - CPCB 60.770000
7 BWSSB Kadabesanahalli, Bengaluru - CPCB 62.450000
8 SIDCO Kurichi, Coimbatore - TNPCB 66.900000
9 Lumpyngngad, Shillong - Meghalaya PCB 72.880000
  StationName NO
0 Lumpyngngad, Shillong - Meghalaya PCB 0.920000
1 Bollaram Industrial Area, Hyderabad - TSPCB 2.960000
2 Borivali East, Mumbai - MPCB 3.260000
3 Plammoodu, Thiruvananthapuram - Kerala PCB 3.410000
4 ICRISAT Patancheru, Hyderabad - TSPCB 3.550000
5 Hombegowda Nagar, Bengaluru - KSPCB 3.570000
6 Sector-51, Gurugram - HSPCB 3.810000
7 Secretariat, Amaravati - APPCB 4.450000
8 Peenya, Bengaluru - CPCB 4.510000
9 Powai, Mumbai - MPCB 4.770000
  StationName NO2
0 Sikulpuikawn, Aizawl - Mizoram PCB 0.390000
1 Lumpyngngad, Shillong - Meghalaya PCB 2.770000
2 Borivali East, Mumbai - MPCB 4.650000
3 Teri Gram, Gurugram - HSPCB 4.910000
4 Manali Village, Chennai - TNPCB 9.110000
5 Plammoodu, Thiruvananthapuram - Kerala PCB 9.190000
6 Tata Stadium, Jorapokhar - JSPCB 9.370000
7 Govt. High School Shikarpur, Patna - BSPCB 9.620000
8 Powai, Mumbai - MPCB 10.390000
9 Sector-25, Chandigarh - CPCC 11.630000
  StationName NOx
0 Lumpyngngad, Shillong - Meghalaya PCB 1.000000
1 Teri Gram, Gurugram - HSPCB 5.960000
2 Tata Stadium, Jorapokhar - JSPCB 7.410000
3 Plammoodu, Thiruvananthapuram - Kerala PCB 7.550000
4 Borivali East, Mumbai - MPCB 7.710000
5 Govt. High School Shikarpur, Patna - BSPCB 9.440000
6 Sector-51, Gurugram - HSPCB 11.210000
7 ICRISAT Patancheru, Hyderabad - TSPCB 11.220000
8 Bollaram Industrial Area, Hyderabad - TSPCB 12.140000
9 Sikulpuikawn, Aizawl - Mizoram PCB 12.610000
  StationName NH3
0 Lumpyngngad, Shillong - Meghalaya PCB 2.810000
1 Plammoodu, Thiruvananthapuram - Kerala PCB 5.030000
2 Worli, Mumbai - MPCB 6.560000
3 Tata Stadium, Jorapokhar - JSPCB 7.000000
4 Bandra, Mumbai - MPCB 7.160000
5 Colaba, Mumbai - MPCB 8.020000
6 Kariavattom, Thiruvananthapuram - Kerala PCB 8.040000
7 Borivali East, Mumbai - MPCB 8.170000
8 Nishant Ganj, Lucknow - UPPCB 8.910000
9 SIDCO Kurichi, Coimbatore - TNPCB 9.310000
  StationName CO
0 Lumpyngngad, Shillong - Meghalaya PCB 0.240000
1 Sikulpuikawn, Aizawl - Mizoram PCB 0.280000
2 Borivali East, Mumbai - MPCB 0.370000
3 Bollaram Industrial Area, Hyderabad - TSPCB 0.410000
4 Worli, Mumbai - MPCB 0.410000
5 Kurla, Mumbai - MPCB 0.420000
6 Colaba, Mumbai - MPCB 0.460000
7 Sanegurava Halli, Bengaluru - KSPCB 0.480000
8 ICRISAT Patancheru, Hyderabad - TSPCB 0.490000
9 Sion, Mumbai - MPCB 0.490000
  StationName SO2
0 Kariavattom, Thiruvananthapuram - Kerala PCB 3.230000
1 BWSSB Kadabesanahalli, Bengaluru - CPCB 3.810000
2 Sanegurava Halli, Bengaluru - KSPCB 3.880000
3 DRM Office Danapur, Patna - BSPCB 4.780000
4 Silk Board, Bengaluru - KSPCB 4.820000
5 Zoo Park, Hyderabad - TSPCB 4.830000
6 Patparganj, Delhi - DPCC 4.840000
7 Central University, Hyderabad - TSPCB 5.150000
8 Jayanagar 5th Block, Bengaluru - KSPCB 5.270000
9 Muradpur, Patna - BSPCB 5.310000
  StationName O3
0 Sikulpuikawn, Aizawl - Mizoram PCB 3.570000
1 Sanegurava Halli, Bengaluru - KSPCB 6.290000
2 Govt. High School Shikarpur, Patna - BSPCB 7.060000
3 Muradpur, Patna - BSPCB 11.920000
4 Chhatrapati Shivaji Intl. Airport (T2), Mumbai - MPCB 12.920000
5 GM Office, Brajrajnagar - OSPCB 16.790000
6 Vasai West, Mumbai - MPCB 17.010000
7 Talcher Coalfields,Talcher - OSPCB 17.570000
8 Manali, Chennai - CPCB 18.850000
9 Borivali East, Mumbai - MPCB 19.020000
  StationName BTX
0 T T Nagar, Bhopal - MPPCB 0.000000
1 Bandra, Mumbai - MPCB 0.030000
2 SIDCO Kurichi, Coimbatore - TNPCB 0.200000
3 Lodhi Road, Delhi - IMD 1.960000
4 North Campus, DU, Delhi - IMD 2.380000
5 CRRI Mathura Road, Delhi - IMD 2.770000
6 Bollaram Industrial Area, Hyderabad - TSPCB 2.960000
7 ICRISAT Patancheru, Hyderabad - TSPCB 3.440000
8 Pusa, Delhi - IMD 3.710000
9 Secretariat, Amaravati - APPCB 3.860000

Most and Least Polluted Cities based on the given dataset¶

Most Polluted¶

  City PM2.5
0 Patna 123.110000
1 Gurugram 117.340000
2 Delhi 117.150000
3 Lucknow 109.940000
4 Ahmedabad 67.820000
5 Jorapokhar 64.670000
6 Brajrajnagar 64.360000
7 Kolkata 64.120000
8 Guwahati 63.940000
9 Talcher 61.010000
  City PM10
0 Delhi 232.730000
1 Gurugram 192.490000
2 Talcher 165.290000
3 Jorapokhar 150.390000
4 Patna 126.910000
5 Brajrajnagar 124.940000
6 Jaipur 123.400000
7 Bhopal 119.210000
8 Guwahati 116.600000
9 Kolkata 115.260000
  City NO
0 Kochi 71.370000
1 Delhi 38.980000
2 Patna 31.800000
3 Talcher 31.770000
4 Mumbai 31.560000
5 Kolkata 26.840000
6 Ernakulam 23.570000
7 Ahmedabad 22.590000
8 Guwahati 20.010000
9 Brajrajnagar 19.200000
  City NO2
0 Ahmedabad 58.850000
1 Delhi 50.800000
2 Kolkata 40.300000
3 Patna 37.560000
4 Visakhapatnam 37.040000
5 Lucknow 33.220000
6 Jaipur 32.360000
7 Bhopal 31.290000
8 Coimbatore 28.970000
9 Hyderabad 28.430000
  City NOx
0 Jorapokhar 99.990000
1 Kochi 68.410000
2 Kolkata 63.340000
3 Delhi 58.570000
4 Mumbai 55.180000
5 Ahmedabad 47.370000
6 Patna 46.110000
7 Guwahati 44.250000
8 Jaipur 39.650000
9 Amritsar 35.690000
  City NH3
0 Chennai 63.400000
1 Delhi 41.990000
2 Brajrajnagar 36.960000
3 Chandigarh 30.600000
4 Lucknow 29.220000
5 Ahmedabad 26.640000
6 Jaipur 26.470000
7 Gurugram 26.210000
8 Aizawl 22.310000
9 Bengaluru 22.160000
  City CO
0 Ahmedabad 22.360000
1 Lucknow 2.130000
2 Delhi 1.980000
3 Talcher 1.850000
4 Bengaluru 1.840000
5 Brajrajnagar 1.790000
6 Ernakulam 1.630000
7 Patna 1.500000
8 Kochi 1.300000
9 Gurugram 1.260000
  City SO2
0 Ahmedabad 55.250000
1 Jorapokhar 34.640000
2 Talcher 28.410000
3 Patna 22.020000
4 Kochi 17.600000
5 Delhi 15.900000
6 Mumbai 15.710000
7 Guwahati 14.660000
8 Amaravati 14.270000
9 Bhopal 13.080000
  City O3
0 Bhopal 59.940000
1 Delhi 51.290000
2 Jaipur 46.600000
3 Ahmedabad 39.310000
4 Amaravati 38.130000
5 Visakhapatnam 37.600000
6 Patna 37.070000
7 Lucknow 36.990000
8 Thiruvananthapuram 34.520000
9 Gurugram 34.250000
  City BTX
0 Kolkata 38.110000
1 Ahmedabad 37.630000
2 Delhi 26.780000
3 Thiruvananthapuram 22.350000
4 Patna 17.270000
5 Visakhapatnam 15.080000
6 Gurugram 14.640000
7 Amritsar 14.500000
8 Hyderabad 10.720000
9 Lucknow 10.410000

Least Polluted¶

  City PM2.5
0 Aizawl 16.850000
1 Ernakulam 24.960000
2 Thiruvananthapuram 27.990000
3 Coimbatore 29.730000
4 Shillong 30.290000
5 Kochi 31.430000
6 Mumbai 35.260000
7 Bengaluru 36.090000
8 Amaravati 37.640000
9 Chandigarh 41.060000
  City PM10
0 Aizawl 23.340000
1 Coimbatore 39.230000
2 Shillong 41.640000
3 Ernakulam 48.310000
4 Thiruvananthapuram 52.790000
5 Chennai 62.950000
6 Kochi 67.340000
7 Amaravati 76.310000
8 Bengaluru 83.590000
9 Chandigarh 85.660000
  City NO
0 Shillong 0.920000
1 Thiruvananthapuram 3.440000
2 Amaravati 4.450000
3 Bhopal 7.020000
4 Coimbatore 7.530000
5 Hyderabad 7.830000
6 Chennai 9.190000
7 Bengaluru 9.400000
8 Aizawl 9.410000
9 Chandigarh 10.470000
  City NO2
0 Aizawl 0.390000
1 Shillong 2.770000
2 Ernakulam 3.630000
3 Thiruvananthapuram 9.370000
4 Jorapokhar 9.370000
5 Chandigarh 11.630000
6 Guwahati 13.560000
7 Talcher 13.770000
8 Kochi 14.860000
9 Brajrajnagar 16.530000
  City NOx
0 Shillong 1.000000
1 Thiruvananthapuram 8.160000
2 Aizawl 12.610000
3 Chandigarh 15.070000
4 Amaravati 15.390000
5 Chennai 17.660000
6 Hyderabad 19.460000
7 Bengaluru 19.700000
8 Bhopal 22.380000
9 Lucknow 22.460000
  City NH3
0 Shillong 2.810000
1 Thiruvananthapuram 5.070000
2 Jorapokhar 7.000000
3 Kochi 7.980000
4 Coimbatore 9.400000
5 Visakhapatnam 10.970000
6 Guwahati 11.100000
7 Talcher 11.600000
8 Amaravati 12.030000
9 Mumbai 13.820000
  City CO
0 Shillong 0.240000
1 Aizawl 0.280000
2 Amritsar 0.550000
3 Mumbai 0.570000
4 Hyderabad 0.590000
5 Amaravati 0.630000
6 Chandigarh 0.630000
7 Jorapokhar 0.650000
8 Guwahati 0.730000
9 Visakhapatnam 0.740000
  City SO2
0 Ernakulam 3.180000
1 Bengaluru 5.510000
2 Thiruvananthapuram 5.650000
3 Shillong 6.620000
4 Aizawl 7.380000
5 Chennai 7.870000
6 Amritsar 8.130000
7 Kolkata 8.530000
8 Coimbatore 8.590000
9 Hyderabad 9.190000
  City O3
0 Aizawl 3.570000
1 Kochi 3.820000
2 Ernakulam 5.960000
3 Brajrajnagar 16.850000
4 Talcher 17.570000
5 Chandigarh 20.050000
6 Amritsar 22.440000
7 Guwahati 25.060000
8 Shillong 27.690000
9 Coimbatore 28.820000
  City BTX
0 Mumbai 0.030000
1 Ernakulam 2.010000
2 Amaravati 3.860000
3 Aizawl 6.190000
4 Guwahati 7.260000
5 Brajrajnagar 7.900000
6 Chandigarh 9.090000
7 Coimbatore 9.840000
8 Lucknow 10.410000
9 Hyderabad 10.720000

Data Interpretation¶

Stations¶

Filling the missing values with the help of median of each column.

StationId StationName City State Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-24 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 0.17 5.92 0.10 184.0 Moderate
1 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-25 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06 184.0 Moderate
2 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-26 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08 197.0 Moderate
3 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-27 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.29 7.63 0.12 198.0 Moderate
4 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-28 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 0.17 5.02 0.07 188.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
107706 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-27 8.65 16.46 NaN NaN NaN NaN 0.69 4.36 30.59 1.32 7.26 NaN 50.0 Good
107707 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-28 11.80 18.47 13.65 200.87 214.20 11.40 0.68 3.49 38.95 1.42 7.92 NaN 65.0 Satisfactory
107708 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-29 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 3.52 8.64 NaN 63.0 Satisfactory
107709 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-30 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 1.86 8.40 NaN 57.0 Satisfactory
107710 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-07-01 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 1.31 7.39 NaN 59.0 Satisfactory

107711 rows × 19 columns

StationId          0
StationName        0
City               0
State              0
Date               0
PM2.5          20417
PM10           41789
NO             15629
NO2            15058
NOx            14346
NH3            47245
CO             11386
SO2            23922
O3             24213
Benzene        30164
Toluene        37453
Xylene         84595
AQI            18958
Air_quality    18958
dtype: int64
PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI
count 87294.000000 65922.000000 92082.000000 92653.000000 93365.000000 60466.000000 96325.000000 83789.000000 83498.000000 77547.000000 70258.000000 23116.000000 88753.000000
mean 80.344685 158.258377 23.065599 35.362506 41.214568 28.824049 1.628631 12.300694 38.221072 3.383664 15.552798 2.458169 179.803004
std 76.654693 123.416377 34.558080 29.746311 45.315124 24.998420 4.488459 13.196388 39.240139 11.300496 29.826612 6.734150 131.420900
min 0.020000 0.010000 0.010000 0.010000 0.000000 0.010000 0.000000 0.010000 0.010000 0.000000 0.000000 0.000000 8.000000
25% 31.940000 70.490000 4.822500 15.130000 13.960000 11.930000 0.530000 5.040000 18.930000 0.160000 0.710000 0.000000 86.000000
50% 56.010000 122.490000 10.270000 27.250000 26.620000 23.650000 0.910000 8.930000 30.880000 1.210000 4.380000 0.400000 133.000000
75% 100.000000 208.967500 24.840000 47.030000 50.400000 38.200000 1.450000 14.900000 47.220000 3.610000 17.620000 2.120000 254.000000
max 1000.000000 1000.000000 470.000000 448.050000 467.630000 418.900000 175.810000 195.650000 963.000000 455.030000 454.850000 170.370000 2049.000000

Replacing zeroes with median values.¶

StationId          0
StationName        0
City               0
State              0
Date               0
PM2.5              0
PM10               0
NO                 0
NO2                0
NOx             4878
NH3                0
CO              7484
SO2                0
O3                 0
Benzene        12876
Toluene        10550
Xylene          6146
AQI                0
Air_quality        0
dtype: int64

Checking the data via various methods for final visualisation

StationId      0
StationName    0
City           0
State          0
Date           0
PM2.5          0
PM10           0
NO             0
NO2            0
NOx            0
NH3            0
CO             0
SO2            0
O3             0
Benzene        0
Toluene        0
Xylene         0
AQI            0
Air_quality    0
dtype: int64
StationId      0
StationName    0
City           0
State          0
Date           0
PM2.5          0
PM10           0
NO             0
NO2            0
NOx            0
NH3            0
CO             0
SO2            0
O3             0
Benzene        0
Toluene        0
Xylene         0
AQI            0
Air_quality    0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
Int64Index: 107711 entries, 0 to 107710
Data columns (total 19 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   StationId    107711 non-null  object        
 1   StationName  107711 non-null  object        
 2   City         107711 non-null  object        
 3   State        107711 non-null  object        
 4   Date         107711 non-null  datetime64[ns]
 5   PM2.5        107711 non-null  float64       
 6   PM10         107711 non-null  float64       
 7   NO           107711 non-null  float64       
 8   NO2          107711 non-null  float64       
 9   NOx          107711 non-null  float64       
 10  NH3          107711 non-null  float64       
 11  CO           107711 non-null  float64       
 12  SO2          107711 non-null  float64       
 13  O3           107711 non-null  float64       
 14  Benzene      107711 non-null  float64       
 15  Toluene      107711 non-null  float64       
 16  Xylene       107711 non-null  float64       
 17  AQI          107711 non-null  float64       
 18  Air_quality  107711 non-null  object        
dtypes: datetime64[ns](1), float64(13), object(5)
memory usage: 16.4+ MB
PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI
count 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000 107711.000000
mean 75.731960 144.381199 21.208942 34.228378 40.476284 26.554569 1.615894 11.552082 36.570829 2.919585 12.096830 0.864530 171.565300
std 69.664187 98.111580 32.268880 27.731846 41.719310 18.905102 4.233512 11.723084 34.684843 9.604304 24.500655 3.227418 120.620077
min 0.020000 0.010000 0.010000 0.010000 0.010000 0.010000 0.010000 0.010000 0.010000 0.010000 0.010000 0.010000 8.000000
25% 37.100000 101.960000 5.660000 16.930000 18.230000 21.050000 0.690000 6.080000 22.390000 1.210000 4.380000 0.400000 95.000000
50% 56.010000 122.490000 10.270000 27.250000 26.620000 23.650000 0.910000 8.930000 30.880000 1.210000 4.380000 0.400000 133.000000
75% 84.700000 146.915000 21.010000 42.770000 45.380000 26.330000 1.360000 12.670000 41.440000 2.410000 8.200000 0.400000 216.000000
max 1000.000000 1000.000000 470.000000 448.050000 467.630000 418.900000 175.810000 195.650000 963.000000 455.030000 454.850000 170.370000 2049.000000
array(['Secretariat, Amaravati - APPCB',
       'GVM Corporation, Visakhapatnam - APPCB',
       'Railway Colony, Guwahati - APCB',
       'DRM Office Danapur, Patna - BSPCB',
       'Govt. High School Shikarpur, Patna - BSPCB',
       'IGSC Planetarium Complex, Patna - BSPCB',
       'Muradpur, Patna - BSPCB', 'Rajbansi Nagar, Patna - BSPCB',
       'Samanpura, Patna - BSPCB', 'Sector-25, Chandigarh - CPCC',
       'Alipur, Delhi - DPCC', 'Anand Vihar, Delhi - DPCC',
       'Ashok Vihar, Delhi - DPCC', 'Aya Nagar, Delhi - IMD',
       'Bawana, Delhi - DPCC', 'Burari Crossing, Delhi - IMD',
       'CRRI Mathura Road, Delhi - IMD', 'DTU, Delhi - CPCB',
       'Dr. Karni Singh Shooting Range, Delhi - DPCC',
       'Dwarka-Sector 8, Delhi - DPCC', 'East Arjun Nagar, Delhi - CPCB',
       'IGI Airport (T3), Delhi - IMD',
       'IHBAS, Dilshad Garden, Delhi - CPCB', 'ITO, Delhi - CPCB',
       'Jahangirpuri, Delhi - DPCC',
       'Jawaharlal Nehru Stadium, Delhi - DPCC',
       'Lodhi Road, Delhi - IMD',
       'Major Dhyan Chand National Stadium, Delhi - DPCC',
       'Mandir Marg, Delhi - DPCC', 'Mundka, Delhi - DPCC',
       'NSIT Dwarka, Delhi - CPCB', 'Najafgarh, Delhi - DPCC',
       'Narela, Delhi - DPCC', 'Nehru Nagar, Delhi - DPCC',
       'North Campus, DU, Delhi - IMD', 'Okhla Phase-2, Delhi - DPCC',
       'Patparganj, Delhi - DPCC', 'Punjabi Bagh, Delhi - DPCC',
       'Pusa, Delhi - DPCC', 'Pusa, Delhi - IMD',
       'R K Puram, Delhi - DPCC', 'Rohini, Delhi - DPCC',
       'Shadipur, Delhi - CPCB', 'Sirifort, Delhi - CPCB',
       'Sonia Vihar, Delhi - DPCC', 'Sri Aurobindo Marg, Delhi - DPCC',
       'Vivek Vihar, Delhi - DPCC', 'Wazirpur, Delhi - DPCC',
       'Maninagar, Ahmedabad - GPCB', 'NISE Gwal Pahari, Gurugram - IMD',
       'Sector-51, Gurugram - HSPCB', 'Teri Gram, Gurugram - HSPCB',
       'Vikas Sadan, Gurugram - HSPCB',
       'Tata Stadium, Jorapokhar - JSPCB', 'BTM Layout, Bengaluru - CPCB',
       'BWSSB Kadabesanahalli, Bengaluru - CPCB',
       'Bapuji Nagar, Bengaluru - KSPCB',
       'City Railway Station, Bengaluru - KSPCB',
       'Hebbal, Bengaluru - KSPCB', 'Hombegowda Nagar, Bengaluru - KSPCB',
       'Jayanagar 5th Block, Bengaluru - KSPCB',
       'Peenya, Bengaluru - CPCB', 'Sanegurava Halli, Bengaluru - KSPCB',
       'Silk Board, Bengaluru - KSPCB',
       'Kariavattom, Thiruvananthapuram - Kerala PCB',
       'Plammoodu, Thiruvananthapuram - Kerala PCB',
       'T T Nagar, Bhopal - MPPCB', 'Bandra, Mumbai - MPCB',
       'Borivali East, Mumbai - MPCB',
       'Chhatrapati Shivaji Intl. Airport (T2), Mumbai - MPCB',
       'Colaba, Mumbai - MPCB', 'Kurla, Mumbai - MPCB',
       'Powai, Mumbai - MPCB', 'Sion, Mumbai - MPCB',
       'Vasai West, Mumbai - MPCB', 'Vile Parle West, Mumbai - MPCB',
       'Worli, Mumbai - MPCB', 'Lumpyngngad, Shillong - Meghalaya PCB',
       'Sikulpuikawn, Aizawl - Mizoram PCB',
       'GM Office, Brajrajnagar - OSPCB',
       'Talcher Coalfields,Talcher - OSPCB',
       'Golden Temple, Amritsar - PPCB', 'Adarsh Nagar, Jaipur - RSPCB',
       'Police Commissionerate, Jaipur - RSPCB',
       'Shastri Nagar, Jaipur - RSPCB',
       'Alandur Bus Depot, Chennai - CPCB',
       'Manali Village, Chennai - TNPCB', 'Manali, Chennai - CPCB',
       'Velachery Res. Area, Chennai - CPCB',
       'SIDCO Kurichi, Coimbatore - TNPCB',
       'Bollaram Industrial Area, Hyderabad - TSPCB',
       'Central University, Hyderabad - TSPCB',
       'ICRISAT Patancheru, Hyderabad - TSPCB',
       'IDA Pashamylaram, Hyderabad - TSPCB',
       'Sanathnagar, Hyderabad - TSPCB', 'Zoo Park, Hyderabad - TSPCB',
       'Central School, Lucknow - CPCB', 'Gomti Nagar, Lucknow - UPPCB',
       'Lalbagh, Lucknow - CPCB', 'Nishant Ganj, Lucknow - UPPCB',
       'Talkatora District Industries Center, Lucknow - CPCB',
       'Ballygunge, Kolkata - WBPCB', 'Bidhannagar, Kolkata - WBPCB',
       'Fort William, Kolkata - WBPCB', 'Jadavpur, Kolkata - WBPCB',
       'Rabindra Bharati University, Kolkata - WBPCB',
       'Rabindra Sarobar, Kolkata - WBPCB', 'Victoria, Kolkata - WBPCB'],
      dtype=object)
IHBAS, Dilshad Garden, Delhi - CPCB           2009
Manali, Chennai - CPCB                        2009
NSIT Dwarka, Delhi - CPCB                     2009
Bandra, Mumbai - MPCB                         2009
Maninagar, Ahmedabad - GPCB                   2009
                                              ... 
DRM Office Danapur, Patna - BSPCB              126
Govt. High School Shikarpur, Patna - BSPCB     121
Teri Gram, Gurugram - HSPCB                    119
Sector-51, Gurugram - HSPCB                    119
Sikulpuikawn, Aizawl - Mizoram PCB             113
Name: StationName, Length: 108, dtype: int64
StationId StationName City State Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality Pollution content
0 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-24 71.36 115.75 1.75 20.65 12.40 12.19 0.10 10.76 109.26 0.17 5.92 0.10 184.0 Moderate 360.41
1 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-25 81.40 124.50 1.44 20.50 12.08 10.72 0.12 15.24 127.09 0.20 6.50 0.06 184.0 Moderate 399.85
2 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-26 78.32 129.06 1.26 26.00 14.85 10.28 0.14 26.96 117.44 0.22 7.95 0.08 197.0 Moderate 412.56
3 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-27 88.76 135.32 6.60 30.85 21.77 12.91 0.11 33.59 111.81 0.29 7.63 0.12 198.0 Moderate 449.76
4 AP001 Secretariat, Amaravati - APPCB Amaravati Andhra Pradesh 2017-11-28 64.18 104.09 2.56 28.07 17.01 11.42 0.09 19.00 138.18 0.17 5.02 0.07 188.0 Moderate 389.86
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
107706 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-27 8.65 16.46 10.27 27.25 26.62 23.65 0.69 4.36 30.59 1.32 7.26 0.40 50.0 Good 157.52
107707 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-28 11.80 18.47 13.65 200.87 214.20 11.40 0.68 3.49 38.95 1.42 7.92 0.40 65.0 Satisfactory 523.25
107708 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-29 18.60 32.26 13.65 200.87 214.20 11.40 0.78 5.12 38.17 3.52 8.64 0.40 63.0 Satisfactory 547.61
107709 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-06-30 16.07 39.30 7.56 29.13 36.69 29.26 0.69 5.88 29.64 1.86 8.40 0.40 57.0 Satisfactory 204.88
107710 WB013 Victoria, Kolkata - WBPCB Kolkata West Bengal 2020-07-01 10.50 36.50 7.78 22.50 30.25 27.23 0.58 2.80 13.10 1.31 7.39 0.40 59.0 Satisfactory 160.34

107711 rows × 20 columns

Plotting of various aspects of data

For curing the error- "'Series' object has no attribute 'iplot'", we're using cufflinks library.

Cities¶

City               0
Date               0
PM2.5           4321
PM10           10866
NO              3276
NO2             3278
NOx             3980
NH3            10061
CO              1745
SO2             3510
O3              3664
Benzene         5298
Toluene         7739
Xylene         17878
AQI             4174
Air_quality     4174
dtype: int64
PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI
count 25210.000000 18665.000000 26255.000000 26253.000000 25551.000000 19470.000000 27786.000000 26021.000000 25867.000000 24233.000000 21792.000000 11653.000000 25357.000000
mean 67.444977 118.257649 17.664483 28.488332 32.327829 23.451706 2.254926 14.658364 34.448364 3.357247 8.736095 3.101291 166.489017
std 65.132855 90.949061 23.287194 24.504528 31.770902 25.655621 7.072508 18.488735 21.743220 16.114631 20.153048 6.789023 141.084091
min 0.040000 0.010000 0.020000 0.010000 0.000000 0.010000 0.000000 0.010000 0.010000 0.000000 0.000000 0.000000 13.000000
25% 28.750000 56.200000 5.630000 11.690000 12.790000 8.490000 0.510000 5.660000 18.780000 0.120000 0.580000 0.130000 81.000000
50% 48.485000 95.710000 9.880000 21.590000 23.500000 15.800000 0.890000 9.160000 30.790000 1.070000 2.960000 0.960000 118.000000
75% 80.487500 149.890000 19.950000 37.520000 40.140000 30.000000 1.450000 15.290000 45.545000 3.090000 9.110000 3.330000 208.000000
max 949.990000 1000.000000 390.680000 362.210000 467.630000 352.890000 175.810000 193.860000 257.730000 455.030000 454.850000 170.370000 2049.000000

Replacing zeroes with median values.¶

City              0
Date              0
PM2.5             0
PM10              0
NO                0
NO2               0
NOx             757
NH3               0
CO             2413
SO2               0
O3                0
Benzene        3913
Toluene        2943
Xylene         1812
AQI               0
Air_quality       0
dtype: int64

Checking the data via various method for visualisation.¶

City           0
Date           0
PM2.5          0
PM10           0
NO             0
NO2            0
NOx            0
NH3            0
CO             0
SO2            0
O3             0
Benzene        0
Toluene        0
Xylene         0
AQI            0
Air_quality    0
dtype: int64
City           0
Date           0
PM2.5          0
PM10           0
NO             0
NO2            0
NOx            0
NH3            0
CO             0
SO2            0
O3             0
Benzene        0
Toluene        0
Xylene         0
AQI            0
Air_quality    0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29531 entries, 0 to 29530
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   City         29531 non-null  object        
 1   Date         29531 non-null  datetime64[ns]
 2   PM2.5        29531 non-null  float64       
 3   PM10         29531 non-null  float64       
 4   NO           29531 non-null  float64       
 5   NO2          29531 non-null  float64       
 6   NOx          29531 non-null  float64       
 7   NH3          29531 non-null  float64       
 8   CO           29531 non-null  float64       
 9   SO2          29531 non-null  float64       
 10  O3           29531 non-null  float64       
 11  Benzene      29531 non-null  float64       
 12  Toluene      29531 non-null  float64       
 13  Xylene       29531 non-null  float64       
 14  AQI          29531 non-null  float64       
 15  Air_quality  29531 non-null  object        
dtypes: datetime64[ns](1), float64(13), object(2)
memory usage: 3.6+ MB
PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI
count 29531.000000 29531.000000 29531.000000 29531.000000 29531.000000 29531.000000 29531.000000 29531.000000 29531.00000 29531.000000 29531.000000 29531.000000 29531.000000
mean 64.670738 109.961189 16.800917 27.722603 31.740472 20.844825 2.246994 14.004839 33.99446 3.088684 7.517378 1.863863 159.635434
std 60.551112 73.118152 22.093196 23.205864 29.303783 21.144911 6.849184 17.446163 20.38535 14.599926 17.397788 4.372919 131.820310
min 0.040000 0.010000 0.020000 0.010000 0.030000 0.010000 0.010000 0.010000 0.01000 0.010000 0.010000 0.010000 13.000000
25% 31.870000 78.440000 6.140000 12.770000 15.710000 11.910000 0.680000 6.040000 20.42000 0.850000 2.620000 0.960000 87.000000
50% 48.485000 95.710000 9.880000 21.590000 23.500000 15.800000 0.890000 9.160000 30.79000 1.070000 2.960000 0.960000 118.000000
75% 72.920000 112.950000 17.760000 34.820000 36.255000 22.045000 1.400000 13.955000 43.00000 2.470000 6.110000 0.960000 182.000000
max 949.990000 1000.000000 390.680000 362.210000 467.630000 352.890000 175.810000 193.860000 257.73000 455.030000 454.850000 170.370000 2049.000000

Hypothesis Testing¶

Visualisation of Air Quality before Covid and After Covid.¶

We know that during covid there is less consumption of fuels in industrial and vehicles. So, we're going to interpret the data on the basis of two types of pollutants that formed during consumption and compare the data before-COVID and after-COVID.

Before Covid¶

Making two types of pollutant Groups
Vehicular Pollutant = PM2.5 + PM10 + NO + NO2 + NOx + NH3 + CO
Industrial Pollutant = SO2 + O3 + Benzene + Toluene + Xylene

<class 'pandas.core.frame.DataFrame'>
Int64Index: 24908 entries, 0 to 29348
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   City                   24908 non-null  object        
 1   Date                   24908 non-null  datetime64[ns]
 2   AQI                    24908 non-null  float64       
 3   Air_quality            24908 non-null  object        
 4   Vehicular Pollutants   24908 non-null  float64       
 5   Industrial Pollutants  24908 non-null  float64       
 6   Total Pollutants       24908 non-null  float64       
dtypes: datetime64[ns](1), float64(4), object(2)
memory usage: 1.5+ MB

Plotting some visual data for further interpretation¶

Plotting the Most Polluted cities in the Bar form on the basis of types of pollutants and total pollutants¶

Plotting the Least Polluted cities in the Bar form on the basis of types of pollutants and total pollutants¶

Satisfaction Level of people in the cities that are majorly polluted¶

City Date AQI Air_quality Vehicular Pollutants Industrial Pollutants Total Pollutants
0 Ahmedabad 2015-01-01 118.0 Moderate 197.205 163.05 360.255
1 Ahmedabad 2015-01-02 118.0 Moderate 194.085 71.56 265.645
2 Ahmedabad 2015-01-03 118.0 Moderate 243.795 85.22 329.015
3 Ahmedabad 2015-01-04 118.0 Moderate 199.845 70.24 270.085
4 Ahmedabad 2015-01-05 118.0 Moderate 263.375 107.32 370.695
... ... ... ... ... ... ... ...
29344 Visakhapatnam 2019-12-28 110.0 Moderate 240.130 65.65 305.780
29345 Visakhapatnam 2019-12-29 133.0 Moderate 180.610 93.24 273.850
29346 Visakhapatnam 2019-12-30 92.0 Satisfactory 216.790 101.37 318.160
29347 Visakhapatnam 2019-12-31 92.0 Satisfactory 222.470 94.36 316.830
29348 Visakhapatnam 2020-01-01 111.0 Moderate 235.200 92.51 327.710

24908 rows × 7 columns

Before doing the hypothesis testing we are doing the normalisation and standardisation of dataset¶

              Pollutants        Mean      Variance
0   Vehicular Pollutants  284.058727  28761.160744
1  Industrial Pollutants   60.332131   1715.762624
2       Total Pollutants  344.390859  35606.345506

Why is Normalization is important?¶

Normalization is generally required when we are dealing with attributes on a different scale, otherwise, it may lead to a dilution in effectiveness of an equally important attribute (on lower scale) because of other attribute having values on larger scale.

In simple words, when multiple attributes are there but attributes have values on different scales, this may lead to poor data models while performing data mining operations. So they are normalized to bring all the attributes on the same scale.

We're using the Standard Scaling method in which Mean is set to 0 and Variance to 1.

Vehicular Pollutants Industrial Pollutants Total Pollutants
0 197.205 163.05 360.255
1 194.085 71.56 265.645
2 243.795 85.22 329.015
3 199.845 70.24 270.085
4 263.375 107.32 370.695
... ... ... ...
29344 240.130 65.65 305.780
29345 180.610 93.24 273.850
29346 216.790 101.37 318.160
29347 222.470 94.36 316.830
29348 235.200 92.51 327.710

24908 rows × 3 columns

Vehicular Pollutants Industrial Pollutants Total Pollutants
0 -0.512146 2.479854 0.084074
1 -0.530544 0.271067 -0.417323
2 -0.237421 0.600852 -0.081486
3 -0.496579 0.239200 -0.393793
4 -0.121965 1.134399 0.139402
Vehicular Pollutants     2.464706e-16
Industrial Pollutants   -2.707183e-16
Total Pollutants        -3.765523e-17
dtype: float64
Vehicular Pollutants     1.00002
Industrial Pollutants    1.00002
Total Pollutants         1.00002
dtype: float64
Vehicular Pollutants     2.354917
Industrial Pollutants    2.941972
Total Pollutants         2.157560
dtype: float64
Vehicular Pollutants     2.354917
Industrial Pollutants    2.941972
Total Pollutants         2.157560
dtype: float64
Vehicular Pollutants Industrial Pollutants Total Pollutants
count 24908.000000 24908.000000 24908.000000
mean 284.058727 60.332131 344.390859
std 169.591158 41.421765 188.696437
min 7.880000 3.340000 21.570000
25% 192.005000 37.290000 235.686250
50% 231.715000 46.050000 285.120000
75% 332.065000 70.880000 409.451250
max 2137.260000 776.150000 2326.750000
array([[<AxesSubplot:title={'center':'Vehicular Pollutants'}>,
        <AxesSubplot:title={'center':'Industrial Pollutants'}>],
       [<AxesSubplot:title={'center':'Total Pollutants'}>,
        <AxesSubplot:>]], dtype=object)
array([[<AxesSubplot:title={'center':'Vehicular Pollutants'}>,
        <AxesSubplot:title={'center':'Industrial Pollutants'}>],
       [<AxesSubplot:title={'center':'Total Pollutants'}>,
        <AxesSubplot:>]], dtype=object)
Vehicular Pollutants Industrial Pollutants Total Pollutants
0 197.205 163.05 360.255
1 194.085 71.56 265.645
2 243.795 85.22 329.015
3 199.845 70.24 270.085
4 263.375 107.32 370.695
... ... ... ...
29344 240.130 65.65 305.780
29345 180.610 93.24 273.850
29346 216.790 101.37 318.160
29347 222.470 94.36 316.830
29348 235.200 92.51 327.710

24908 rows × 3 columns

Hypothesis Test is performed on the Total Pollutants *In which we're saying that the alpha or p_value critical is taken as 0.05 as per the data.**

We know that the if the pollutants is greater than 320 ug / m3, then it will be considered as Bad quality air.

So by considering the above statement we're taking the hypothesis:

Null Hypothesis(H0) >= 320 Alternate Hypothesis(H1) < 320

Calculating the Z Score value:

0.12925977347574807

Calculating the p-value:

0.4485760502293482

Here we're getting that p-value is greater than 0.05 which means that our Null Hypothesis is right.

This shows that the air pollution is poor before covid, which is logical also.

After Covid¶

Making two types of pollutant Groups
Vehicular Pollutant = PM2.5 + PM10 + NO + NO2 + NOx + NH3 + CO
Industrial Pollutant = SO2 + O3 + Benzene + Toluene + Xylene

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4623 entries, 1827 to 29530
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   City                   4623 non-null   object        
 1   Date                   4623 non-null   datetime64[ns]
 2   AQI                    4623 non-null   float64       
 3   Air_quality            4623 non-null   object        
 4   Vehicular Pollutants   4623 non-null   float64       
 5   Industrial Pollutants  4623 non-null   float64       
 6   Total Pollutants       4623 non-null   float64       
dtypes: datetime64[ns](1), float64(4), object(2)
memory usage: 288.9+ KB

Plotting some visual data for further interpretation**¶

Plotting the Most Polluted cities in the Bar form on the basis of types of pollutants and total pollutants¶

Plotting the Least Polluted cities in the Bar form on the basis of types of pollutants and total pollutants¶

Satisfaction Level of people in the cities that are majorly polluted¶

City Date AQI Air_quality Vehicular Pollutants Industrial Pollutants Total Pollutants
1827 Ahmedabad 2020-01-02 162.0 Moderate 248.62 85.00 333.62
1828 Ahmedabad 2020-01-03 220.0 Poor 256.23 97.88 354.11
1829 Ahmedabad 2020-01-04 254.0 Poor 276.04 100.41 376.45
1830 Ahmedabad 2020-01-05 255.0 Poor 219.89 106.40 326.29
1831 Ahmedabad 2020-01-06 175.0 Moderate 217.00 98.16 315.16
... ... ... ... ... ... ... ...
29526 Visakhapatnam 2020-06-27 41.0 Good 131.18 46.89 178.07
29527 Visakhapatnam 2020-06-28 70.0 Satisfactory 156.99 46.19 203.18
29528 Visakhapatnam 2020-06-29 68.0 Satisfactory 151.14 40.36 191.50
29529 Visakhapatnam 2020-06-30 54.0 Satisfactory 129.27 43.13 172.40
29530 Visakhapatnam 2020-07-01 50.0 Good 128.09 24.14 152.23

4623 rows × 7 columns

Before doing the hypothesis testing we are doing the normalisation and standardisation of dataset

              Pollutants        Mean      Variance
0   Vehicular Pollutants  219.726823  18957.688067
1  Industrial Pollutants   61.207863   4191.823405
2       Total Pollutants  280.934686  23610.013374

We're using the Standard Scaling method in which Mean is set to 0 and Variance to 1.¶

Vehicular Pollutants Industrial Pollutants Total Pollutants
1827 248.62 85.00 333.62
1828 256.23 97.88 354.11
1829 276.04 100.41 376.45
1830 219.89 106.40 326.29
1831 217.00 98.16 315.16
... ... ... ...
29526 131.18 46.89 178.07
29527 156.99 46.19 203.18
29528 151.14 40.36 191.50
29529 129.27 43.13 172.40
29530 128.09 24.14 152.23

4623 rows × 3 columns

Vehicular Pollutants Industrial Pollutants Total Pollutants
0 0.209870 0.367518 0.342917
1 0.265146 0.566476 0.476281
2 0.409039 0.605557 0.621687
3 0.001185 0.698085 0.295207
4 -0.019807 0.570801 0.222765
Vehicular Pollutants     1.229579e-17
Industrial Pollutants   -4.303525e-17
Total Pollutants         6.762682e-17
dtype: float64
Vehicular Pollutants     1.000108
Industrial Pollutants    1.000108
Total Pollutants         1.000108
dtype: float64
Vehicular Pollutants      1.872608
Industrial Pollutants    10.533733
Total Pollutants          1.835976
dtype: float64
Vehicular Pollutants      1.872608
Industrial Pollutants    10.533733
Total Pollutants          1.835976
dtype: float64
Vehicular Pollutants Industrial Pollutants Total Pollutants
count 4623.000000 4623.000000 4623.000000
mean 219.726823 61.207863 280.934686
std 137.686920 64.744292 153.655502
min 15.790000 3.980000 38.910000
25% 126.520000 37.800000 174.445000
50% 189.730000 50.840000 251.860000
75% 270.150000 71.225000 342.095000
max 1266.900000 969.380000 1298.770000
array([[<AxesSubplot:title={'center':'Vehicular Pollutants'}>,
        <AxesSubplot:title={'center':'Industrial Pollutants'}>],
       [<AxesSubplot:title={'center':'Total Pollutants'}>,
        <AxesSubplot:>]], dtype=object)
array([[<AxesSubplot:title={'center':'Vehicular Pollutants'}>,
        <AxesSubplot:title={'center':'Industrial Pollutants'}>],
       [<AxesSubplot:title={'center':'Total Pollutants'}>,
        <AxesSubplot:>]], dtype=object)
Vehicular Pollutants Industrial Pollutants Total Pollutants
1827 248.62 85.00 333.62
1828 256.23 97.88 354.11
1829 276.04 100.41 376.45
1830 219.89 106.40 326.29
1831 217.00 98.16 315.16
... ... ... ...
29526 131.18 46.89 178.07
29527 156.99 46.19 203.18
29528 151.14 40.36 191.50
29529 129.27 43.13 172.40
29530 128.09 24.14 152.23

4623 rows × 3 columns

Hypothesis Test is performed on the Total Pollutants *In which we're saying that the alpha or p_value critical is taken as 0.05 as per the data.**

We know that the if the pollutants is greater than 320 ug / m3, then it will be considered as Bad quality air.

So by considering the above statement we're taking the hypothesis:

Null Hypothesis(H0) >= 320 Alternate Hypothesis(H1) < 320

Calculating the Z-Score value:

-0.25423960141421537

Calculating the p-value:

0.39965522897962547

Here we're getting that p-value is greater than 0.05 which means that our Null Hypothesis is right.

This shows that the air pollution is poor after covid also, which is again logical.

Predicting the AQI¶

Here, we're going to predict the AQI in two ways:
1) Calculating the AQI as whole of India yearly by using Linear Regression model
2) Calculating the AQI for each city for upcoming years

1. Calculating the AQI as whole of India yearly by using Linear Regression¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
0 Ahmedabad 2015-01-01 48.485 95.71 0.92 18.22 17.15 15.80 0.92 27.64 133.36 1.07 0.02 0.96 118.0 Moderate
1 Ahmedabad 2015-01-02 48.485 95.71 0.97 15.69 16.46 15.80 0.97 24.55 34.06 3.68 5.50 3.77 118.0 Moderate
2 Ahmedabad 2015-01-03 48.485 95.71 17.40 19.30 29.70 15.80 17.40 29.07 30.70 6.80 16.40 2.25 118.0 Moderate
3 Ahmedabad 2015-01-04 48.485 95.71 1.70 18.48 17.97 15.80 1.70 18.59 36.08 4.43 10.14 1.00 118.0 Moderate
4 Ahmedabad 2015-01-05 48.485 95.71 22.10 21.42 37.76 15.80 22.10 39.33 39.31 7.01 18.89 2.78 118.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
29526 Visakhapatnam 2020-06-27 15.020 50.94 7.68 25.06 19.54 12.47 0.47 8.55 23.30 2.24 12.07 0.73 41.0 Good
29527 Visakhapatnam 2020-06-28 24.380 74.09 3.42 26.06 16.53 11.99 0.52 12.72 30.14 0.74 2.21 0.38 70.0 Satisfactory
29528 Visakhapatnam 2020-06-29 22.910 65.73 3.45 29.53 18.33 10.71 0.48 8.42 30.96 0.01 0.01 0.96 68.0 Satisfactory
29529 Visakhapatnam 2020-06-30 16.640 49.97 4.05 29.26 18.80 10.03 0.52 9.84 28.30 1.07 2.96 0.96 54.0 Satisfactory
29530 Visakhapatnam 2020-07-01 15.000 66.00 0.40 26.85 14.05 5.20 0.59 2.10 17.05 1.07 2.96 0.96 50.0 Good

29531 rows × 16 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29531 entries, 0 to 29530
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   City         29531 non-null  object        
 1   Date         29531 non-null  datetime64[ns]
 2   PM2.5        29531 non-null  float64       
 3   PM10         29531 non-null  float64       
 4   NO           29531 non-null  float64       
 5   NO2          29531 non-null  float64       
 6   NOx          29531 non-null  float64       
 7   NH3          29531 non-null  float64       
 8   CO           29531 non-null  float64       
 9   SO2          29531 non-null  float64       
 10  O3           29531 non-null  float64       
 11  Benzene      29531 non-null  float64       
 12  Toluene      29531 non-null  float64       
 13  Xylene       29531 non-null  float64       
 14  AQI          29531 non-null  float64       
 15  Air_quality  29531 non-null  object        
dtypes: datetime64[ns](1), float64(13), object(2)
memory usage: 3.6+ MB
Date Month Year AQI_avg
0 2015-01-01 1 2015 177.000000
1 2015-01-02 1 2015 174.000000
2 2015-01-03 1 2015 122.166667
3 2015-01-04 1 2015 146.714286
4 2015-01-05 1 2015 147.571429
... ... ... ... ...
2004 2020-06-27 6 2020 74.346154
2005 2020-06-28 6 2020 79.038462
2006 2020-06-29 6 2020 78.000000
2007 2020-06-30 6 2020 72.230769
2008 2020-07-01 7 2020 89.615385

2009 rows × 4 columns

Predicting the AQI for taking all years in consideration¶

Date Month Year AQI_avg
0 2015-01-01 1 2015 177.000000
1 2015-01-02 1 2015 174.000000
2 2015-01-03 1 2015 122.166667
3 2015-01-04 1 2015 146.714286
4 2015-01-05 1 2015 147.571429
... ... ... ... ...
2004 2020-06-27 6 2020 74.346154
2005 2020-06-28 6 2020 79.038462
2006 2020-06-29 6 2020 78.000000
2007 2020-06-30 6 2020 72.230769
2008 2020-07-01 7 2020 89.615385

2009 rows × 4 columns

<AxesSubplot:xlabel='AQI_avg'>
array([[ 1.        ,  1.33630621],
       [ 1.        ,  0.80178373],
       [ 1.        ,  0.26726124],
       [ 1.        , -0.26726124],
       [ 1.        , -0.80178373],
       [ 1.        , -1.33630621]])
Gradient Descent: 161.82, -20.54
Year AQI_avg Actual Predicted
5 2020 114.927696 114.927696 134.372270
4 2019 156.116369 156.116369 145.351362
3 2018 176.373409 176.373409 156.330454
2 2017 164.348404 164.348404 167.309546
1 2016 178.330358 178.330358 178.288638
0 2015 180.848065 180.848065 189.267730
12.749887150652524

Predicting the AQI by taking the years before COVID-19¶

Date Month Year AQI_avg
0 2015-01-01 1 2015 177.000000
1 2015-01-02 1 2015 174.000000
2 2015-01-03 1 2015 122.166667
3 2015-01-04 1 2015 146.714286
4 2015-01-05 1 2015 147.571429
... ... ... ... ...
1822 2019-12-28 12 2019 183.434783
1823 2019-12-29 12 2019 204.304348
1824 2019-12-30 12 2019 221.565217
1825 2019-12-31 12 2019 204.956522
1826 2020-01-01 1 2020 211.217391

1827 rows × 4 columns

<AxesSubplot:xlabel='AQI_avg'>
array([[ 1.        ,  1.33630621],
       [ 1.        ,  0.80178373],
       [ 1.        ,  0.26726124],
       [ 1.        , -0.26726124],
       [ 1.        , -0.80178373],
       [ 1.        , -1.33630621]])
Gradient Descent: 177.87, 5.20
Year AQI_avg Actual Predicted
5 2020 114.927696 211.217391 184.818792
4 2019 156.116369 156.116369 182.039275
3 2018 176.373409 176.373409 179.259758
2 2017 164.348404 164.348404 176.480242
1 2016 178.330358 178.330358 173.700725
0 2015 180.848065 180.848065 170.921208
16.55481596210976

Predicting the AQI by taking the years after COVID-19¶

Date Month Year AQI_avg
1826 2020-01-01 1 2020 211.217391
1827 2020-01-02 1 2020 187.260870
1828 2020-01-03 1 2020 163.739130
1829 2020-01-04 1 2020 146.956522
1830 2020-01-05 1 2020 160.521739
... ... ... ... ...
2004 2020-06-27 6 2020 74.346154
2005 2020-06-28 6 2020 79.038462
2006 2020-06-29 6 2020 78.000000
2007 2020-06-30 6 2020 72.230769
2008 2020-07-01 7 2020 89.615385

183 rows × 4 columns

<class 'pandas.core.frame.DataFrame'>
Int64Index: 183 entries, 1826 to 2008
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   Date     183 non-null    datetime64[ns]
 1   Month    183 non-null    int64         
 2   Year     183 non-null    int64         
 3   AQI_avg  183 non-null    float64       
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 7.1 KB
<AxesSubplot:xlabel='AQI_avg'>
array([[ 1.        ,  1.38873015],
       [ 1.        ,  0.9258201 ],
       [ 1.        ,  0.46291005],
       [ 1.        ,  0.        ],
       [ 1.        , -0.46291005],
       [ 1.        , -0.9258201 ],
       [ 1.        , -1.38873015]])
Gradient Descent: 111.53, -31.99
Year AQI_avg Actual Predicted
0 2015.0 180.848065 167.450940 155.955477
1 2016.0 178.330358 157.586207 141.146985
2 2017.0 164.348404 110.869132 126.338492
3 2018.0 176.373409 88.646154 111.530000
4 2019.0 156.116369 88.243176 96.721508
5 2020.0 114.927696 78.310256 81.913015
6 NaN NaN 89.615385 67.104523
15.84282666114951

2. Calculating the AQI for each cities for upcoming years¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
0 Ahmedabad 2015-01-01 48.485 95.71 0.92 18.22 17.15 15.80 0.92 27.64 133.36 1.07 0.02 0.96 118.0 Moderate
1 Ahmedabad 2015-01-02 48.485 95.71 0.97 15.69 16.46 15.80 0.97 24.55 34.06 3.68 5.50 3.77 118.0 Moderate
2 Ahmedabad 2015-01-03 48.485 95.71 17.40 19.30 29.70 15.80 17.40 29.07 30.70 6.80 16.40 2.25 118.0 Moderate
3 Ahmedabad 2015-01-04 48.485 95.71 1.70 18.48 17.97 15.80 1.70 18.59 36.08 4.43 10.14 1.00 118.0 Moderate
4 Ahmedabad 2015-01-05 48.485 95.71 22.10 21.42 37.76 15.80 22.10 39.33 39.31 7.01 18.89 2.78 118.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
29526 Visakhapatnam 2020-06-27 15.020 50.94 7.68 25.06 19.54 12.47 0.47 8.55 23.30 2.24 12.07 0.73 41.0 Good
29527 Visakhapatnam 2020-06-28 24.380 74.09 3.42 26.06 16.53 11.99 0.52 12.72 30.14 0.74 2.21 0.38 70.0 Satisfactory
29528 Visakhapatnam 2020-06-29 22.910 65.73 3.45 29.53 18.33 10.71 0.48 8.42 30.96 0.01 0.01 0.96 68.0 Satisfactory
29529 Visakhapatnam 2020-06-30 16.640 49.97 4.05 29.26 18.80 10.03 0.52 9.84 28.30 1.07 2.96 0.96 54.0 Satisfactory
29530 Visakhapatnam 2020-07-01 15.000 66.00 0.40 26.85 14.05 5.20 0.59 2.10 17.05 1.07 2.96 0.96 50.0 Good

29531 rows × 16 columns

array(['Ahmedabad', 'Aizawl', 'Amaravati', 'Amritsar', 'Bengaluru',
       'Bhopal', 'Brajrajnagar', 'Chandigarh', 'Chennai', 'Coimbatore',
       'Delhi', 'Ernakulam', 'Gurugram', 'Guwahati', 'Hyderabad',
       'Jaipur', 'Jorapokhar', 'Kochi', 'Kolkata', 'Lucknow', 'Mumbai',
       'Patna', 'Shillong', 'Talcher', 'Thiruvananthapuram',
       'Visakhapatnam'], dtype=object)
Ahmedabad             2009
Delhi                 2009
Mumbai                2009
Bengaluru             2009
Lucknow               2009
Chennai               2009
Hyderabad             2006
Patna                 1858
Gurugram              1679
Visakhapatnam         1462
Amritsar              1221
Jorapokhar            1169
Jaipur                1114
Thiruvananthapuram    1112
Amaravati              951
Brajrajnagar           938
Talcher                925
Kolkata                814
Guwahati               502
Coimbatore             386
Shillong               310
Chandigarh             304
Bhopal                 289
Ernakulam              162
Kochi                  162
Aizawl                 113
Name: City, dtype: int64

The AQI of India seems to vary sporadically between local regions but,as we saw, possesses a seasonal rally trend in the monsoon. For this reason, Prophet was chosen as it has excellent seasonality learning capabilities in time-series analaysis.

I) Forecasting Delhi's AQI for the upcoming years by using the dataset available¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
10229 Delhi 2015-01-01 313.22 607.98 69.16 36.39 110.59 33.85 15.20 9.25 41.68 14.36 24.86 9.84 472.0 Severe
10230 Delhi 2015-01-02 186.18 269.55 62.09 32.87 88.14 31.83 9.54 6.65 29.97 10.55 20.09 4.29 454.0 Severe
10231 Delhi 2015-01-03 87.18 131.90 25.73 30.31 47.95 69.55 10.61 2.65 19.71 3.91 10.23 1.99 143.0 Moderate
10232 Delhi 2015-01-04 151.84 241.84 25.01 36.91 48.62 130.36 11.54 4.63 25.36 4.26 9.71 3.34 319.0 Very Poor
10233 Delhi 2015-01-05 146.60 219.13 14.01 34.92 38.25 122.88 9.20 3.33 23.20 2.80 6.21 2.96 325.0 Very Poor
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12233 Delhi 2020-06-27 39.80 155.94 10.88 21.46 22.47 31.43 0.87 10.38 18.88 1.69 19.99 0.43 112.0 Moderate
12234 Delhi 2020-06-28 59.52 308.65 12.67 21.60 23.86 29.27 0.94 10.70 18.05 1.71 25.13 1.74 196.0 Moderate
12235 Delhi 2020-06-29 44.86 184.12 10.50 21.57 21.94 27.97 0.88 11.58 26.61 2.13 23.80 1.13 233.0 Poor
12236 Delhi 2020-06-30 39.80 91.98 5.99 17.96 15.44 28.48 0.84 10.51 37.29 1.57 16.37 0.49 114.0 Moderate
12237 Delhi 2020-07-01 54.01 128.66 6.33 21.05 16.81 29.06 0.97 11.15 29.73 2.03 23.57 0.65 101.0 Moderate

2009 rows × 16 columns

Date AQI
0 2015-01-01 472.0
1 2015-01-02 454.0
2 2015-01-03 143.0
3 2015-01-04 319.0
4 2015-01-05 325.0
... ... ...
2004 2020-06-27 112.0
2005 2020-06-28 196.0
2006 2020-06-29 233.0
2007 2020-06-30 114.0
2008 2020-07-01 101.0

2009 rows × 2 columns

01:18:43 - cmdstanpy - INFO - Chain [1] start processing
01:18:44 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x1ee27236140>
ds
2369 2021-06-27
2370 2021-06-28
2371 2021-06-29
2372 2021-06-30
2373 2021-07-01
ds yhat yhat_lower yhat_upper
2369 2021-06-27 15.151967 -72.179657 107.341479
2370 2021-06-28 8.304171 -81.092528 98.563748
2371 2021-06-29 7.531971 -81.042863 95.885354
2372 2021-06-30 8.359923 -80.456249 92.903216
2373 2021-07-01 8.126453 -76.932192 108.591637
  0%|          | 0/5 [00:00<?, ?it/s]
01:18:56 - cmdstanpy - INFO - Chain [1] start processing
01:18:56 - cmdstanpy - INFO - Chain [1] done processing
01:19:06 - cmdstanpy - INFO - Chain [1] start processing
01:19:07 - cmdstanpy - INFO - Chain [1] done processing
01:19:16 - cmdstanpy - INFO - Chain [1] start processing
01:19:17 - cmdstanpy - INFO - Chain [1] done processing
01:19:25 - cmdstanpy - INFO - Chain [1] start processing
01:19:26 - cmdstanpy - INFO - Chain [1] done processing
01:19:36 - cmdstanpy - INFO - Chain [1] start processing
01:19:36 - cmdstanpy - INFO - Chain [1] done processing
Cross Validation accuracy: 67.97047760703944

By using this model we are getting the accuracy of 67.97 for Delhi which is really good for using this model. This model can be used for predicting the trends in the AQI for Delhi of upcoming years.

Printing the trend of AQI in Delhi for upcoming years and finding the yearly, monthly, and weekly behaviour.

II) Forecasting Patna's AQI for the upcoming years by using the dataset available¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
23864 Patna 2015-06-01 48.485 95.71 14.41 25.06 39.32 15.80 1.56 1.80 8.89 1.07 0.29 0.96 118.0 Moderate
23865 Patna 2015-06-02 48.485 95.71 25.00 22.48 47.50 15.80 2.35 9.69 9.90 0.08 0.83 0.09 118.0 Moderate
23866 Patna 2015-06-03 48.485 95.71 14.29 17.16 29.81 15.80 1.69 20.61 12.63 1.07 0.33 0.96 118.0 Moderate
23867 Patna 2015-06-04 48.485 95.71 13.03 15.62 28.63 15.80 1.20 4.35 9.77 0.01 0.28 0.96 118.0 Moderate
23868 Patna 2015-06-05 48.485 95.71 10.40 10.36 20.14 15.80 1.29 7.22 11.90 1.07 0.15 0.96 118.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
25717 Patna 2020-06-27 17.710 63.73 9.47 23.01 22.28 1.91 0.87 3.63 23.39 1.09 3.07 0.97 65.0 Satisfactory
25718 Patna 2020-06-28 19.270 57.42 30.19 18.13 36.76 2.05 0.72 3.92 17.37 1.18 2.90 1.24 82.0 Satisfactory
25719 Patna 2020-06-29 17.240 42.83 42.40 20.51 47.69 2.26 0.88 3.60 17.50 1.51 4.91 1.74 88.0 Satisfactory
25720 Patna 2020-06-30 29.760 60.68 42.12 27.50 52.04 1.59 0.83 3.91 21.70 1.58 8.59 2.02 93.0 Satisfactory
25721 Patna 2020-07-01 35.420 57.82 44.50 31.15 57.72 1.14 0.82 3.99 25.76 1.73 5.50 2.14 98.0 Satisfactory

1858 rows × 16 columns

01:19:47 - cmdstanpy - INFO - Chain [1] start processing
01:19:48 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x1ee26e83cd0>
ds
2218 2021-06-27
2219 2021-06-28
2220 2021-06-29
2221 2021-06-30
2222 2021-07-01
ds yhat yhat_lower yhat_upper
2218 2021-06-27 11.754445 -62.226604 82.207675
2219 2021-06-28 8.512360 -67.814273 82.807106
2220 2021-06-29 13.255852 -59.275857 86.541985
2221 2021-06-30 13.006771 -59.560854 85.133394
2222 2021-07-01 9.804651 -63.738632 83.167927
  0%|          | 0/4 [00:00<?, ?it/s]
01:19:59 - cmdstanpy - INFO - Chain [1] start processing
01:19:59 - cmdstanpy - INFO - Chain [1] done processing
01:20:09 - cmdstanpy - INFO - Chain [1] start processing
01:20:10 - cmdstanpy - INFO - Chain [1] done processing
01:20:17 - cmdstanpy - INFO - Chain [1] start processing
01:20:17 - cmdstanpy - INFO - Chain [1] done processing
01:20:27 - cmdstanpy - INFO - Chain [1] start processing
01:20:28 - cmdstanpy - INFO - Chain [1] done processing
Cross Validation accuracy: 63.38596674035115

By using this model we are getting the accuracy of 63.386 for Patna which is really good for using this model. This model can be used for predicting the trends in the AQI for Patna of upcoming years.

Printing the trend of AQI in Patna for upcoming years and finding the yearly, monthly, and weekly behaviour.

III) Forecasting Bengaluru's AQI for the upcoming years by using the dataset available¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
4294 Bengaluru 2015-01-01 48.485 95.71 3.26 17.33 10.88 20.36 0.33 3.54 10.73 0.56 4.64 0.96 118.0 Moderate
4295 Bengaluru 2015-01-02 48.485 95.71 6.05 19.73 14.14 23.74 1.35 3.97 22.77 0.65 5.31 0.96 118.0 Moderate
4296 Bengaluru 2015-01-03 48.485 95.71 11.91 19.88 20.72 4.32 17.40 13.61 12.03 0.53 19.25 0.96 118.0 Moderate
4297 Bengaluru 2015-01-04 48.485 95.71 7.45 21.61 16.88 0.87 5.05 6.52 17.70 0.55 7.47 0.96 118.0 Moderate
4298 Bengaluru 2015-01-05 48.485 95.71 9.52 22.17 21.76 31.38 1.83 4.71 12.72 0.40 4.36 0.96 118.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6298 Bengaluru 2020-06-27 16.600 29.48 3.06 13.68 13.07 6.88 0.67 7.29 15.69 0.21 1.18 0.96 51.0 Satisfactory
6299 Bengaluru 2020-06-28 20.440 26.34 2.69 10.33 10.58 6.58 0.66 6.60 17.59 0.12 0.94 0.96 61.0 Satisfactory
6300 Bengaluru 2020-06-29 28.680 29.27 3.62 12.12 12.94 6.80 0.56 6.33 16.99 0.17 1.17 0.96 65.0 Satisfactory
6301 Bengaluru 2020-06-30 14.470 24.26 4.61 12.69 15.00 6.82 0.56 6.45 16.08 0.18 0.86 0.96 63.0 Satisfactory
6302 Bengaluru 2020-07-01 17.500 30.48 3.95 13.25 14.83 7.42 0.54 6.66 15.40 0.27 0.65 0.96 43.0 Good

2009 rows × 16 columns

01:20:40 - cmdstanpy - INFO - Chain [1] start processing
01:20:40 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x1ee2cf6f580>
ds
2369 2021-06-27
2370 2021-06-28
2371 2021-06-29
2372 2021-06-30
2373 2021-07-01
ds yhat yhat_lower yhat_upper
2369 2021-06-27 51.442944 8.732539 93.997069
2370 2021-06-28 51.458122 8.943406 94.890603
2371 2021-06-29 54.972129 9.719688 98.774818
2372 2021-06-30 54.693472 14.603014 99.555499
2373 2021-07-01 53.862229 11.849875 96.461272
  0%|          | 0/5 [00:00<?, ?it/s]
01:20:52 - cmdstanpy - INFO - Chain [1] start processing
01:20:53 - cmdstanpy - INFO - Chain [1] done processing
01:21:03 - cmdstanpy - INFO - Chain [1] start processing
01:21:03 - cmdstanpy - INFO - Chain [1] done processing
01:21:12 - cmdstanpy - INFO - Chain [1] start processing
01:21:13 - cmdstanpy - INFO - Chain [1] done processing
01:21:22 - cmdstanpy - INFO - Chain [1] start processing
01:21:23 - cmdstanpy - INFO - Chain [1] done processing
01:21:32 - cmdstanpy - INFO - Chain [1] start processing
01:21:33 - cmdstanpy - INFO - Chain [1] done processing
Cross Validation accuracy: 63.49849589422245

By using this model we are getting the accuracy of 63.498 for Bengaluru which is really good for using this model. This model can be used for predicting the trends in the AQI for Bengaluru of upcoming years.

Printing the trend of AQI in Bengaluru for upcoming years and finding the yearly, monthly, and weekly behaviour.

IV) Forecasting Chennai's AQI for the upcoming years by using the dataset available¶

City Date PM2.5 PM10 NO NO2 NOx NH3 CO SO2 O3 Benzene Toluene Xylene AQI Air_quality
7834 Chennai 2015-01-01 48.485 95.71 16.30 15.39 22.68 4.59 1.17 9.20 11.35 0.17 2.96 0.96 118.0 Moderate
7835 Chennai 2015-01-02 48.485 95.71 16.49 13.42 23.09 7.83 1.23 8.61 9.16 0.13 2.96 0.96 118.0 Moderate
7836 Chennai 2015-01-03 48.485 95.71 9.72 19.56 9.99 4.63 0.77 48.23 13.45 0.03 2.96 0.96 118.0 Moderate
7837 Chennai 2015-01-04 48.485 95.71 9.60 16.20 11.71 5.23 1.00 27.96 10.33 1.07 2.96 0.96 118.0 Moderate
7838 Chennai 2015-01-05 48.485 95.71 9.16 16.30 12.94 5.50 0.90 16.60 9.36 0.57 2.96 0.96 118.0 Moderate
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9838 Chennai 2020-06-27 26.420 39.30 7.25 12.96 19.59 33.20 1.10 7.29 68.51 0.10 0.07 0.96 95.0 Satisfactory
9839 Chennai 2020-06-28 25.930 45.54 7.81 10.00 16.39 35.98 0.76 6.48 77.45 0.09 2.96 0.96 98.0 Satisfactory
9840 Chennai 2020-06-29 21.300 22.21 7.65 9.69 16.74 34.07 0.96 6.62 62.57 0.09 0.01 0.96 104.0 Moderate
9841 Chennai 2020-06-30 24.140 30.66 8.42 12.38 20.29 34.17 1.05 7.50 68.75 0.17 0.16 0.96 110.0 Moderate
9842 Chennai 2020-07-01 15.950 4.85 6.22 10.72 16.44 33.52 1.02 9.23 48.37 0.09 2.96 0.96 92.0 Satisfactory

2009 rows × 16 columns

01:21:45 - cmdstanpy - INFO - Chain [1] start processing
01:21:46 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x1ee27552b30>
ds
2369 2021-06-27
2370 2021-06-28
2371 2021-06-29
2372 2021-06-30
2373 2021-07-01
ds yhat yhat_lower yhat_upper
2369 2021-06-27 65.917531 4.447421 124.194548
2370 2021-06-28 63.813983 5.102808 127.140649
2371 2021-06-29 69.038115 15.507342 127.347812
2372 2021-06-30 73.645892 16.467333 135.093176
2373 2021-07-01 73.117267 14.157471 128.271758
  0%|          | 0/5 [00:00<?, ?it/s]
01:21:57 - cmdstanpy - INFO - Chain [1] start processing
01:21:57 - cmdstanpy - INFO - Chain [1] done processing
01:22:07 - cmdstanpy - INFO - Chain [1] start processing
01:22:07 - cmdstanpy - INFO - Chain [1] done processing
01:22:17 - cmdstanpy - INFO - Chain [1] start processing
01:22:17 - cmdstanpy - INFO - Chain [1] done processing
01:22:27 - cmdstanpy - INFO - Chain [1] start processing
01:22:28 - cmdstanpy - INFO - Chain [1] done processing
01:22:37 - cmdstanpy - INFO - Chain [1] start processing
01:22:37 - cmdstanpy - INFO - Chain [1] done processing
Cross Validation accuracy: 61.76724564183594

By using this model we are getting the accuracy of 61.767 for Chennai which is really good for using this model. This model can be used for predicting the trends in the AQI for Chennai of upcoming years.

Printing the trend of AQI in Chennai for upcoming years and finding the yearly, monthly, and weekly behaviour.

From the above examples of 4 cities, by using this model, we are getting a cross-validation accuracy above 60. This shows that we can implement this model for predicting or forecasting the behaviour of AQI in future for various cities across India.¶